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I—* 

Abstract Extracting a subset of a given OWL ontology that captures 
all the ontology's knowledge about a specified set of terms is a well- 
understood task. This task can be based, for instance, on locality-based 
modules (LBMs). These come in two flavours, syntactic and semantic, 
and a syntactic LBM is known to contain the corresponding semantic 
LBM. For syntactic LBMs, polynomial extraction algorithms are known, 
implemented in the OWL API, and being used. In contrast, extracting 
semantic LBMs involves reasoning, which is intractable for OWL 2 DL, 
and these algorithms had not been implemented yet for expressive onto- 
logy languages. 

We present the first implementation of semantic LBMs and report on 
experiments that compare them with syntactic LBMs extracted from 
real-life ontologies. Our study reveals whether semantic LBMs are worth 
the additional extraction effort, compared with syntactic LBMs. 

<N 

1 Introduction 

>■ 

. Extracting a subset of a given OWL ontology that captures all the ontology's 

$_i 1 knowledge about a specified set of concept and role names is an interesting task 

for various applications, and it is by now well-understood [2,10,11]. In general, 
we consider a setting where, for a given signature, we want to determine a (small) 
subset of a given ontology such that any axiom over the signature entailed by 
the ontology is also entailed by the subset. For expressive logics, this task can 
be implemented by making use of the notion of locality, and results in what is 
known as locality-based modules (LBMs) [2]. Locality comes in many different 
flavours, in particular there are notions of syntactic and semantic locality. A 
syntactic LBM is known to contain the corresponding semantic LBM, but might 
also contain extra axioms which are, because they are not in the semantic LBM, 
superfluous for entailments over the given signature. Algorithms for the extrac- 
tion of syntactic LBMs are known that run in time that is polynomial in the size 
of the ontology (thus much cheaper than reasoning), implemented in the OWL 
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API, and being used. In contrast, despite the fact that algorithms for extracting 
semantic LBMs are known, until now and to the best of our knowledge, they had 
not yet been implemented. Moreover, these involve entailment checking, and are 
thus intractable for expressive profiles of OWL 2. 

We present the first implementation of semantic LBMs and report on exper- 
iments that compare them with syntactic LBMs extracted from real-life onto- 
logies. The contributions of this paper are as follows: we show with statistical 
significance that, for almost all members of a large corpus of existing ontologies, 
there is no difference between any syntactic LBM and its corresponding semantic 
LBM. In the few cases where differences occur, these differences are modest and 
not worth the increased computation time needed to compute semantic LBMs. 
In addition, we isolate two types of axioms that lead to differences, where one 
is a simple tautology that can, in principle, be detected by a straightforward 
addition to the syntactic locality checker. Furthermore, our results show that 
the extraction of semantic LBMs, which is in principle hard, seems feasible in 
practice. The lesson we learn from these results is that "Cheap is Great"! 

2 Preliminaries 

We assume the reader to be familiar with OWL and the underlying description 
logic S1ZOXQ [1,8], and will define the central notions around locality-based 
modularity [2]. 

Let Nc be a set of concept names, and Nr a set of role names. A signature 
£ is a set of terms, i.e., a set £ C Nc U Nr of concept and role names. We can 
think of a signature as specifying a topic of interest. Axioms that only use terms 
from £ can be thought of as "on-topic", and all other axioms as "off-topic". For 
instance, if £ = {Animal, Duck, Grass, eats}, then Duck C Beats. Grass is on-topic, 
while Duck C Bird is off-topic. 

Any concept, role, or axiom that uses only terms from £ is called a £-concept, 
£-role, or £-axiom. Given any sucli_object A, we call the set of terms in X the 
signature of X and denote it with X. 

Given an interpretation X, we denote its restriction to the terms in a signature 
£ with Two interpretations I and J are said to coincide on a signature £, 
in symbols 1\ E = J\ E , if A 1 = A J and X 1 = X J for all X e £. 

There are a number of variants of the notion of conservative extensions, which 
capture the desired preservation of knowledge to different degrees. We focus on 
the deductive variant. 

Definition 1. Let M C O be <S7\L0IQ-ontologies and £ a signature. 

(1) O is a deductive £ -conservative extension (£-dCE) of M if, for all S1ZOZQ- 
axioms a with a C £, it holds that M. |= a if and only if O \= a. 

(2) M is a dCE-based module for £ of O if O is a £-dCE of M. 

Unfortunately, deciding in general if a set of axioms is a module in this sense 
is hard or even impossible for expressive DLs [6,12], and finding a minimal one 
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is even more so. However, "good sized" modules that are efficiently computable 
have been introduced [2]. They are based on the locality of single axioms, which 
means that, given E, the axiom can always be satisfied independently of the 
interpretation of the Z'-terms, but in a restricted way: by interpreting all non-17 
terms either as the empty set (0- locality) or as the full domain 4 (/^-locality). 

Definition 2. A SIZOl Q-a,xiom a is called % -local (A-local) w.r.t. signature E 
if, for each interpretation X, there exists an interpretation J such that I\e = 
J\e, J \= a, and for each X € a \ E, X J = (for each C e a \ E, C J = A 
and for each R 6 a \ E, R J = A x A). 

It has been shown in [2] that Ai C O and all axioms in O \ M. being 0-local 
(or all axioms being Zi-local) w.r.t. E U M is sufficient for O to be a Z'-dCE of 
M.. The converse does not hold: e.g., the axiom A = B is neither 0- nor A- local 
w.r.t. {^4}, but the ontology {A = B} is an {A}-dCE of the empty ontology. 

Furthermore, locality can be tested using available DL-reasoners [2], which 
makes this problem considerably easier than testing conservativity. However, 
reasoning in expressive DLs is still complex, e.g. N2ExpTiME-complete for 
S1ZOXQ [9]. In order to achieve tractable module extraction, a syntactic ap- 
proximation of locality has been introduced in [2] . The following definition cap- 
tures only the case of <S"HQ-TBoxes and can straightforwardly be extended to 
S1ZOXQ ontologies. 

Definition 3. An axiom a is called syntactically L-local (T -local) w.r.t. signa- 
ture E if it is of the form C x C C, C C C T , C x = C x , C T = C T , R 1 - C R 
(R C _R T ), or Trans(i? ± ) (Trans(i? T )), where C is an arbitrary concept, R is an 
arbitrary role name, R^ £E (R T £ E), and C 1 - and C T are from Bot(I7) and 
Top(E) as defined in Part (a) (resp. (b)) of the table below. 

(a) ±-Locality Let A ± ,R ± <£ E, C" ± G Bot(r), C£ } £ Top(r), n G N \ {0} 

Bot(X') ::= A x | ± | ^C T \CnC ± \C ± nC\ 3R.C 1 - | ^nR.C 1 ^ \ 3R ± .C \ >nR ± .C 
Top(E) ::= T | ^C x | Cj U Cj \ ^0 R.C 



(b) T-Locality Let A T , R T £ E, C x G Bot(X'), Cj {) G Top(X'), n G N \ {0} 

Bot(r) ::= ± | ^C* T | C n C x | C x n C 3i?.C x | >ni?.C* x 

Top(r) ::= A T | T | ^C* x | Cj nCj \ 3R T .C T | ^nR T .C T \ R.C 



It has been shown in [2] that _L-locality (T-locality) of an axiom a w.r.t. 
E implies 0-locality (Z\-locality) of a w.r.t. E. Therefore, all axioms in O \ M 
being ±-local (or all axioms being T-local) w.r.t. E U M is sufficient for O to 
be a E-dCE of M . The converse does not hold; examples can be found in [2] . 

For each of the four locality notions, modules of O are obtained by starting 
with an empty set of axioms and subsequently adding axioms from O that are E- 
non- local. In order for this procedure to be correct, the signature against which 
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locality is checked has to be extended with the terms in the axioms that are 
added in each step, so that the^resulting module Ai consists of all the non-local 
axioms with respect to E U Ai. Definition 4 (1) introduces locality-based mod- 
ules, which are always dCE-based modules [2], although not necessarily minimal 
ones. Modules based on syntactic (semantic) locality can be made smaller by 
iteratively nesting T- and _L-extraction (A- and 0-extraction) , and the result 
is still a dCE-based module [2,13]. These so-called T_L*-modules (Z\0*-modules) 
are introduced in Definition 4 (3). 

Definition 4. Let x e {0,/A,_L,T}, yz G {T_L,/i0}, O an ontology and E a 
signature. 

(1) An ontology Ai is the x-module of O w.r.t. E if it is the output of Al- 
gorithm 1. We write Ai = x-vnod(E, O). 

(2) An ontology Ai is the yz-module of O w.r.t. E, written Ai — yz-mod(E, O), 
if M = y-mod(E, z-mod(£, £>)). 

(3) Let (A4i)i^o be a sequence of ontologies such that A4q = O and Aii+i = 
yz-mod(E, Aii) for every i ^ 0. For the smallest n ^ with M. n = Ad n +i, 
we call M„ the yz* -module of O w.r.t. E, written M = yz*-mod(S, O). 



Algorithm 1 Extract a locality-based module 

Input: Ont. O, sig. E, x £ {0, A, _L, T} Output: a:-module M of O w.r.t. E 

M <r- 0; O' i- O 
repeat 

changed false 
for all a € O' do 

if a not rr-local w.r.t. E U M then 

M 4- M U {a}; O' <-& \ {a}; changed «- true 
until changed = false 
return Ai 



As for (1), it has been shown in [2] that the output M of Algorithm 1 does 
not depend on the order in which the axioms a are selected. 5 Furthermore, 
the integer n in (3) exists because the sequence {M.i)i^o is decreasing (more 
precisely, we have Mo D ■■■ D M n = A4 n +i = •••)■ Due to monotonicity 
properties of locality-based modules, the dual notions of _LT*- and 0Z\*-modules 
are uninteresting because they coincide with those of T_L*- and Z\0*-modules. 

Roughly speaking, a A- or T-module for S gives a view from above because 
it contains all subclasses of class names in E, while a 0- or _L-module for E gives 
a view from below since it contains all superconcepts of concept names in E. 

Modulo the locality check, Algorithm 1 runs in time cubic in \0\ + \E\ [2]. 
Modules based on _L/T-locality are therefore a feasible approximation for mod- 
ules based on 0/Z\-locality. In both cases, modules are extracted axiom by axiom 
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but, as said above, the 0/Z\-locality check is more complex. A module extractor 
is implemented in the OWL API 6 and SSWAP 7 . To summarize: 

1. Given an ontology O, the semantic module M s £ m for a signature S is con- 
tained in the corresponding syntactic module A4 s ^ n for the same seed signa- 
ture. 8 This means that in principle more unnecessary axioms for preserving 
entailments over S can end up in syntactic modules rather than in semantic 
modules. 

2. The extraction of a syntactic module can be done in polynomial time w.r.t. 
the size of the ontology O. In contrast, the extraction of a semantic module 
is as hard as reasoning. 

3 Experimental design 

The main aim of this paper is to investigate how well syntactic locality approx- 
imates semantic locality. In particular, we want to see how (un)likely it is that 
syntactic locality-based modules are larger than semantic locality-based ones 
and how large these differences are. We also want to understand empirically how 
much more costly semantic locality is in terms of performance. 

Selection of the Corpus. For our experiments, we have built a corpus containing: 
(1) from the TONES repository, 9 those ontologies that have already been studied 
in a previous work on modularity [4]: Koala, Mereology, University, People, mini- 
Tambis, OWL-S, Tambis, Galen; (2) all ontologies from the NCBO BioPortal 
ontology repository. 10 

We then filter out all those the ontologies for which at least one of the fol- 
lowing problems occurs: the ontology is impossible to download; the .owl file 
is corrupted when downloaded; the file is not parseable; the ontology is incon- 
sistent. Furthermore, due to time constraints, we exclude from this preliminary 
investigation all ontologies whose size exceeds 10,000 axioms. 

This selection results in a corpus of 156 ontologies, which greatly differ in 
size and expressivity [7], as summarized in Table 3. For a full list of the corpus, 
please refer to the Appendix. 



Repository 


Range of expressivity 


Range #axs. 


Range sig. size 


BioPortal 
TONES 


ACCN-SUXMiV) /SOXM(V) 
AC-SHOT! (V) /SH01Q(V) 


38-4,735 
13-9,629 


21-3,161 
14-9,221 



Table 1. Ontology corpus 



6 http://owlapi.sourceforge.net 

7 http://sswap.info 

8 Recall that _L-syntactic modules approximate 0-semantic modules, while T-syntactic 
modules approximate Z\-semantic modules. 

9 http: //owl . cs.manchester.ac.uk/repository/ 
http : / /bioportal . bioontology . org 
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Comparing Syntactic and Semantic Locality. In order to compare syntactic and 
semantic locality, we want to understand: 

1. whether, for a given seed signature U, the semantic S- module is likely to be 
smaller than the syntactic Z'-module, and if so by how much, 11 

2. how feasible the extraction of semantic modules is. 

Here, we focus on the two corresponding notions of 0-semantic locality and 
^-syntactic locality. In particular, ^-syntactic locality has been throughly in- 
vestigated in previous work [3], and it has proven to have many interesting 
properties. A completion of the investigation described in this paper for all fun- 
damental notions of modules is planned in our future work. 

Due to the recursive nature of the locality-based module extraction algorithm, 
we want to investigate locality both on a 

— per-axiom basis: given an axiom a and a signature S, is it likely that a is 
semantically 0-local w.r.t. S but not syntactically _L- local w.r.t. SI 

— per-module basis: given a signature E, is it likely that L-mod(E, O) ^ 
0-mod(Z', 0)1 If yes, is it likely that the difference is large? 

Hence we need to pick, for each ontology in our corpus, a suitable set of sig- 
natures, and this poses a significant problem. First, we do not yet have enough 
insight into what typical seed signatures are for module extraction. One could 
assume that large ones are rarely relevant for module extraction — why bother 
with extracting a large module — but this still leaves a large, i.e., exponential 
space of possible seed signatures. If to = #0, there are 2 m possible seed signa- 
tures for which axioms can be tested for locality and for which modules can be 
extracted. Hence a full investigation is infeasible. 

One could assume that the comparison between semantic and syntactic mod- 
ules could be easier since many signatures can lead to the same module. In other 
words, the statistically significant number of modules w.r.t. the total number 
of modules is not larger than that of seed signatures needed w.r.t. the total 
number of seed signatures. In previous work [4,5], however, modules have been 
studied with respect to how numerous they are in real-world ontologies. The 
experiments carried out suggest that the number of modules in ontologies is, in 
general, exponential w.r.t. the size of the ontology. Moreover, the extraction of 
enough different modules can be hard, because by looking just at seed signatures 
there is no chance to avoid the extraction of the same module many times. In 
particular^ for a module M there can be exponentially many seed signatures 
w.r.t. #M that generate M [3]. 

As a consequence, we compare the two kinds of locality of axioms — both 
on a per-axiom basis and a per-module basis — w.r.t. random signatures. To 
avoid any bias, we select a random signature as follows: we set each named 
entity E in the ontology to have probability p = 1/2 of being included in the 
signature. Thus each seed signature has the same probability to be chosen. For 
ontologies whose signature exceeds 9 entities, in order to get results where the 
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true proportion of differences between the two notions of locality lies in the 
confidence interval (±5%) with confidence level 95%, we have to select only 400 
random signatures [14]. That is, we need to test only 400 random signatures to 
have a confidence of 95% (±5%) that the differences/equalities we observe reflect 
the real ones. 

Non-random seed signatures. A module, in general, does not necessarily show any 
internal coherence: intuitively, if we had an ontology describing some knowledge 
from both the domains of Geology and of Philosophy, we could still extract the 
module for the signature S = {Epistemology, Mineral}. This module is likely 
to be the union of the two disjoint modules for Si = {Epistemology} and 
S 2 = {Mineral}. This combinatorial behaviour can lead to exponentially many 
modules in the size of the signature of the ontology and indeed, as mentioned 
above, the number of modules in ontologies seems to be exponential [4,5]. 

In contrast to general modules, genuine modules can be called coherent: they 
are defined as those modules that cannot be decomposed into the union of two 
different modules. Notably, there are only linearly many genuine modules in the 
size of the ontology O, and the set of genuine modules is a base for all general 
modules: any module is either genuine or the union of genuine modules. The 
linear bound on the number of genuine modules is due to the fact that, for each 
genuine rr-module M, there is an axiom a such that M = x-mod(a, O). 

Thus genuine modules can be said to be interesting modules that we can 
fully investigate. Hence in addition to the above mentioned investigation of _L- 
and 0-modules for random signatures, we also look at all axiom signatures. 

In summary, we test: 

(Tl) for random seed signatures S, 

(a) for each axiom a in our corpus, is a semantically 0-local w.r.t. £ but 
not syntactically _L- local w.r.t. SI 

(b) is _L-mod(Z', O) ^ 0-mod(i7, 0)1 If yes, we determine the difference and 
its size. 

(T2) for each axiom signature from our corpus, is ±-mod(a, O) ^ 0-mod(a, 0)1 
If yes, we determine the difference and its size. 

4 Experimental comparison 

No differences. The main result of the experiment is that, for 151 of the 156 
ontologies we tested, no difference between _L- and 0-locality can be observed. 
These 151 ontologies exclude the two NCBO BioPortal ontologies EFO (Ex- 
perimental Factor Ontology) and SWO (Software Ontology), as well as Koala, 
miniTambis, and Tambis. More specifically, for every generated seed signature, 
the corresponding _L- and 0-module agree, and every axiom is either _L- and 
0-local, or neither. This statement applies to all randomly generated seed sig- 
natures as well as for all axiom signatures - which are seed signatures for all 
genuine modules. We can therefore draw the following conclusions for the 151 
ontologies with respect to (Tl) and (T2) above. 
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(Tl) Given an arbitrary seed signature E, there is no difference (a) between 
_L- and 0-locality of any given axiom w.r.t. E and (b) between the _L- and 
0-modules for E, both times at a significance level of 0.05. 

(T2) Given any axiom signature E, there is no difference between the _L- and 
0-modules for E. 

In the case of the 151 ontologies, the extraction of a 0-module (with tautology 
tests performed by FaCT++) often took considerably longer than the extraction 
of the corresponding ±-module. For example, for MoleculeRole, the largest of 
the 151 ontologies, times to extract a ±-module (test all axioms for _L-locality, 
respectively) ranged between 27 and 169ms (21 and 77ms, respectively), while 
the extraction of a 0-module (test of all axioms for 0-locality, resp.) took up 
to 6 x as long, on average 2.7 x (2.0 x, resp.). It is also worth noting that the 
ontologies Galen and People, which are renowned for having particularly large 
J_-modules [2,5], are among those without differences between J_- and 0-locality. 

Differences. For the five ontologies where differences between _L- and 0-modules 
(or -locality) occur, we isolated two types of culprits - axioms which are not 
J_-local w.r.t. some signature E, but which are 0-local w.r.t. E. Type-1 culprits 
are simple tautologies that have accidentally entered the "inferred view" - i.e., 
closure under certain entailments - of two ontologies. They do not occur in the 
original "asserted" versions and can, in principle, be detected by a slightly refined 
syntactic locality check. Type-2 culprits are definitions of concept names via a 
conjunction that satisfies certain conditions explained below. There are not many 
type-1 and type-2 axioms in the affected ontologies, and the observed differences 
are comparably small. Table 2 gives an overview of the differences observed. 

Type-1 culprits are axioms InverseObjectProperties(P, InverseOf (P) ) , 
where P is a role. This translates into the tautology P = (P~)~ in DL nota- 
tion. Such an axiom is therefore 0-local w.r.t. any signature. However, it behaves 
differently for _L-locality: if the signature E contains P, then both sides of the 
equation are neither in Hot(E) nor in Top(Z'), hence the axiom is considered 
non-local; otherwise, both sides are _L-equivalent, hence the axiom is local. 

Type-1 axioms occur in the "inferred view" of the ontologies EFO and SWO. 
Table 2 shows the relatively modest differences caused by these axioms. In all 
cases, there are no other axioms in the differences. This means that no differences 
occur for the non-inferred original versions of EFO and SWO. 

Type-2 culprits are complex definitions A = C of a concept name A where 
C is a disjunction that contains both a universal and an existential (or min- 
imum cardinality) restriction on the same role. This affects the ontologies Koala, 
miniTambis, and Tambis. The effect is best illustrated for Koala, which contains 
exactly one such axiom, namely M = S n Vc.F n Vg.{m} n =3 c.T, where we 
have abbreviated the concept names MaleStudentWith.3Daugh.ters, Student, 
Female, the roles hasChildren, hasGender, and the nominal male. Now if the 
signature against which the axiom is tested for locality contains {S, c,g} but 
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Ontology #axs ^differences difference time culprit 

sizes ratio type and 













T6l. 


avg. 


frequency 


swo 


3446 


Tla 


400 


6-22 


0-1% 


3.31 


1 (30x) 






1 1 D 


/inn 
4UU 


OQ OO 
Zo—ZiJ 


1— z/o 


0.11 








TO 

1 z 


Q A A R 


Q 1 

o— 1 


1—0 /c 


O.oD 




EFO 


6008 


Tl a 


400 


8-24 


0-1% 


1.42 


1 (32x) 






Tib 


400 


13-30 


0-1% 


1.38 








T2 


128 


1-4 


9-17% 






Koala 


42 


Tla 








0% 




2 (lx) 






Tib 


2 


1 


3% 










T2 








0% 






miniTambis 


170 


Tla 


68 


1-2 


1-3% 




2 (3x) 






Tib 


93 


1-4 


1-3% 










T2 


26 


1-7 


6-75% 






Tambis 


592 


Tla 


58 


1-3 


0-1% 


3.31 


2 (llx) 






Tib 


229 


2-11 


0-2% 


5.01 








T2 


191 


4-41 


2-26% 







Table 2. Overview table of differences observed. The columns show: the ontology name; 
the overall number of axioms; the name of the test (see list on Page 7); the number of 
cases with differences; the number of axioms in the differences (absolute and relative 
to the ±-case); the average time ratio : _L (" — " indicates that no reliable statement is 
possible: the time for _L is only a few, often 0, milliseconds); the type of culprit present 
and the number of axioms of this type. 

neither M nor F, then this axiom is not _L-local because none of the conjuncts on 
the right-hand side is in Bot(Z'). On the other hand, this axiom is a tautology 
when M and F are replaced by _L: the conjunction Vc._L n =3 c.T cannot have any 
instances, regardless of how c is interpreted. 

For Koala, this effect only causes two singleton differences between sets of 
local axioms for the randomly generated seed signatures, as shown in Table 2. 
For axiom signatures, there is no difference. Interestingly, this effect does not 
propagate to modules: for all signatures, _L- and 0-modules are the same. The 
reason might be that (a) g is used in many axioms and is thus very likely to 
contribute to the extended signature during module extraction, and (b) then the 
axiom defining F is no longer local, which "pulls" F into the extended signature, 
preventing the observed effect. 

In miniTambis and Tambis, this effect is much stronger and affects a large 
proportion of modules, as shown in Table 2. The differences in these cases do 
not only consist of culprit axioms, but also of axioms that become non-local 
after the signature has been extended by the terms in the culprit axioms. Still, 
the size of the differences is mostly modest while, for Tambis, the 0-locality test 
(0-module extraction) takes on average over three times (five times) as long as 
the _L-locality test (±-module extraction). 
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5 Conclusion and Outlook 

Summary. We obtain two main observations from the experiments carried out. 

— In practice, there is no or little difference between semantic and syntactic 
locality. That is, the computationally cheaper syntactic locality is a good 
approximation of semantic locality. 

— Though in principle hard to compute, semantic modules can be extracted 
rather fast in practice. 

These results suggest that it is questionable to conclude that semantic locality 
should be preferred to syntactic locality. In terms of computation time, there is 
often a benefit in using syntactic locality: the average speed-up compared to the 
extraction of a semantic-locality based module is by a factor of up to 6. For 
some particular module pairs, it is higher by an order of magnitude. The gain 
in module size is zero or so small that it is hard to justify the extra time spent. 
In particular, there is no gain in size for the ontologies Galen and People, which 
are "renowned" for having disproportionately large modules [2,5]. 

Our results are interesting not only because they provide an evaluation of 
how good the cheap syntactic locality approximates semantic locality, but also 
because they enabled us to fix bugs in the implementation of syntactic modular- 
ity. For example, earlier data from the experiment have shown that reflexivity 
axioms had been treated incorrectly by the syntactic locality checker. 

Future Work. It is evident that this work is preliminary. It investigates only 
the differences between the related notions of _L- and 0-locality. We plan to ex- 
tend the same study to other notions of locality, in particular, nested modules 
(T_L*- vs. Z\0*-modules) - these notions are the most economical in terms of 
module size. Moreover, we want to extend the investigation to the remaining 
larger ontologies in the BioPortal repository and further large ontologies, e.g., 
some versions of the NCI Thesaurus 12 . Preliminary results with a version that 
is not among the regular releases show differences due to type-2 culprits, but we 
have not included them here because the differences disappear after removing 
axioms that were introduced due a problem with object and annotation proper- 
ties when the ontology file is parsed by the OWL API. This behaviour is yet to 
be investigated and explained. 

Another interesting extension is to modify the seed signature sampling. Cur- 
rently, the random variable "size of the seed signature generated" follows the 
binomial distribution with expected value to/2 and variance to/4. Hence, most 
signatures in the sample have size around to/2; small and large signatures are un- 
derrepresented. For example, for one ontology with 915 terms, all signature sizes 
lay between 422 and 509. One might argue that, for big ontologies, the typical 
module extraction scenario does not require large seed signatures - but it does 
sometimes require relatively small seed signatures, for example, when a module 
is extracted to efficiently answer a given entailment query of typically small size. 



Downloadable from http://evs.nci.nih.gov/ftpl/NCI_Thesaurus 
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On the other hand, large modules resulting from larger seed signatures may be 
more likely to differ. We therefore plan an alternative seed signature sampling 
via bins for average signature sizes: repeat the current sampling procedure scaled 
to several subintervals of the range of possible signature sizes. 

Our current results answer the question whether there is a significant differ- 
ence between the two locality notions with respect to a given signature. It is also 
interesting to ask the same question relative to a given module. To answer it, the 
sampling of modules instead of seed signatures requires further investigation. 

Acknowledgment. We thank Rafael Gongalves for helpful comments. 
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Appendix: overview of the ontologies used 



Ontology 


Expressivity 


# Axioms 


Sig. size 


aba-adult-mouse-brain 


ACCX 


3,441 


915 


adverse-event-reporting-ontology 


SH01Af(V) 


574 


503 


african-traditional-medicine 


ACE 


208 


225 


amino-acid 


ACCT{V) 


477 


52 


amphibian-gross-anatomy 


ACE 


2,673 


1,647 


anatomical-entity-ontology 


ACE 


352 


359 


ascomycete-phenotype-ontology 


AC 


294 


329 


basic- formal-ontology 


ACC 


95 


39 


basic-vertebrate-anatomy 


SUIT 


388 


231 


bilateria-anatomy 


ACEH+ 


138 


121 


bioinformatics-data-formats-identifiers... 


ACE+ 


3,803 


2,844 


biological-i magi ng-met hods 


S 


548 


626 


biomedical- resource-ontology 


sniT(v) 


681 


672 


biopax 


SHTM(V) 


391 


165 


biotop 


Sill 


680 


404 


birnlex 


AC 


3,572 


3,589 


bleed ing-history-phenotype 


ACCXT(V) 


1,925 


582 


body-system 


AC 


28 


30 


breast-tissue-cell- lines 


ACCH(T>) 


2,734 


412 


brenda-tissue-enzyme-source 


ACE 


6,284 


5,272 


c-elegans-development 


AC 


71 


73 


c-elegans-phenotype 


AC+ 


2,279 


2,026 


cao 


SHXQ(V) 


476 


290 


cell-behavior-ontology 


ACUO 


13 


14 


cell-type 


ACC 


2,975 


2,012 


cereal-plant-development 


ACE 


235 


237 


cereal-plant-gross-anatomy 


ACE+ 


1,839 


1,173 


cognitive-atlas 


ACC 


3,622 


1,585 


common-anatomy- reference-ontology 


ACE+ 


54 


54 


com mon-terminology-criteria-for-ad verse.. 


AC(V) 


6,940 


3,889 


dendritic-cell 


ACC 


313 


192 


dikb-evidence-ontology 


ACCH01Af(V) 


640 


251 


drosophila-development 


ACEH+ 


410 


138 


electrocardiography-ontology 


ACCXT{V) 


1,274 


1,171 


environment-ontology 


S 


1,807 


1,574 


epilepsy 


ACH(V) 


145 


148 


event-inoh-pathway-ontology 


ACEH+ 


7,131 


3,836 


evidence-codes 


ACE 


342 


268 


exo 


ACE+ 


85 


121 


experimental-factor-ontology 


AC-HXT+ 


6,008 


4,869 


fda-medical-devices-2010 


AC 


4,907 


4,941 


fly-taxonomy 


AC 


6,587 


6,599 


f lybase-controlled-voca bu lary 


ACE+ 


659 


771 


fungal-gross-anatomy 


ACE1+ 


106 


86 


gene- regulation-ontology 


ACCH1Q(T>) 


962 


544 


Continued on next page 
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Ontology 


Expressivity 


^Axioms 


Sig. size 


general-formal-ontology 


SriXQ 


212 


86 


hom-datasource oshpd 


AC 


351 


361 


hom-datasource oshpdsc 


AC 


351 


360 


nom-dxprocs mdcdrg 


A r* 

AC 


774 


784 


hom-harvard 


AC 


189 


191 


L ■ jn II 

hom-icd9 procs oshpd 


AC 


4,642 


4,652 


hom-icd9cm-ecodes 


A r* 

AC 


1,490 


1,500 


hom-icd9cm procedures 


AC 


4,644 


4,656 


hom-mdcdrg oshpd 


AC 


773 


782 


hom-oshpd-sc 


A r 

AC 


266 


278 


hom-oshpd usecase 


AC 


393 


408 


hom-procs2 oshpd 


A n 

AC 


4,642 


4,652 


hom-ucare 


AC 


64 


75 


horn mdcs-drgs 


AC 


774 


780 


homerun-ontology 


AC 


1,194 


1,094 


host-pathogen- interactions-ontology 


SMX 


403 


319 


human-developmental-anatomy-abs... 


ACE 


2,335 


2,316 


human-developmental-anatomy-tim... 


Arc 
ACt 


8,339 


8,343 


human-disease 


A n 
AC 


6,753 


8,625 


hymenoptera-anatomy-ontology 


S1Z 


8,493 


4,324 


imgt-ontology 


ACCXNyD) 


1,112 


122 


■ £ i ' I • _ j | 

mrectious-disease-ontology 




1,221 


640 


information-artifact-ontology 


ST-iOXN{T>) 


294 


197 


interaction-network-ontology 


ACC 


1,034 


981 


ixno 


AC 


39 


53 


1 i ■ r i 

leu kocyte-surrace- markers 


AC+ 


472 


473 


Iinkingkin2pep 


SriXTyV) 


30 


17 


lipid-ontology 


ACCrlXN 


2,375 


762 


loggerhead-nesting 


Arc 

ACt 


347 


Ol A 

314 


maize-gross-anatomy 


Arc 

ACt 


217 


184 


mass-spectrometry 


Sri 


4,447 


4,492 


medaka-fish-anatomy-and-dev... 


Arc 
ALt 


a a no 
4,4U2 


4,363 


mego 


ACS+ 


421 


370 


minimal-anatomical-terminology 


ACS 


504 


481 


i i i i_ i 
molecule-role-inoh-protein-name... 


ACS+ 


9,629 


9221 


mouse-adult-gross-anatomy 




o, it D 




mouse- pathology 


AC£+ 


808 


757 


multiple-alignment 


AC£+ 


168 


174 


neomark-oral-cancer-ontology 


SHXQ 


399 


352 


neural-electromagnetic-ontologies 


SHXQ(V) 


2,578 


1766 


neural-immune-gene-ontology 


SH 


8,835 


4,843 


neuro- behavior-ontology 


AC 


768 


733 


nif-dysfunction 


S1ZOXT(V) 


2,635 


2,951 


nmr-instrument-specific-component... 


AC 


290 


301 


obo- relationship- types 


ACK+ 


33 


26 


ontology- for-d rug-discovery-investigations 


SrlOXNiV) 


996 


837 


ontology-for-general-medical-science 


ACCO 


216 


162 
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Ontology 


Expressivity 


"M 1 Axioms 


Sig. size 


ontology-for-genetic-interval 


SriXM{V) 


509 


298 


ontology-for-microrna-target-prediction 


ACCX(V) 


415 


338 


ontology-for-parasite-lifecycle 


SUOXT 


855 


415 


ontology-of-general-purpose-datatypes 


ACCHOT 


459 


193 


ontology-of-geographical- region 


AC 


38 


39 


ontology-of-glucose-metabolism-disorder 


AC 


132 


132 


ontology-of-medically-related-social-entities ACCO 


157 


99 


ontology-of- physics- for- biology 


ACCUXQ{V) 


795 


545 


pathogen-transmission 


AC 


24 


28 


pediatric-terminology 


AC 


894 


891 


phare 


ACCUXT(V) 


459 


312 


phenotypic-quality 


SH 


1,831 


2,282 


phylogenetic-ontology 


AC 


I I 


83 


physicalfields 


ACT 


136 


78 


physico-chemical-process 


ACE 


734 


560 


pilot-ontology 


ACCXT(V) 


85 


39 


pko re 


ACCT 


771 


770 


plant-environmental-conditions 


AC 


499 


501 


plant-growth-and-development-stage 


ACS+ 


240 


285 


plant-ontology 


S 


2,215 


1,460 


plant-trait-ontology 


ACS 


1,290 


1,124 


platy nereis-stage-ontology 


ACS 


31 


18 


protein- modification 


ACS+ 


1,986 


1,346 


protein-ontology 


ACCTip) 


689 


226 


protein-protein-interaction 


ACS+ 


1,007 


962 


pseudogene 


AC 


19 


23 


quantitative-imaging-biomarker... 


ACUXT(V) 


1,697 


1,381 


rat-strain-ontology 


ACS 


4,122 


3,004 


reproductive-trait-and-phenotype... 


AC 


91 


96 


sample-processing-and-sep... 


AC 


193 


194 


sequence-types-and-features 


SHX 


O C A C 

z,o4o 


2, lot 


sleep-domain-ontology 


SUXT(V) 


363 


256 


smoking- behavior- risk-ontology 


ACS1+ 


185 


135 


softwa re-o n to 1 ogy 


SHOXQ(T>) 


3,446 


1,039 


spatial-ontology 


ACSUI+ 


235 


172 


spider-ontology 


ACS+ 


MO 


Ool 


student-health-record 


ACH(V) 


418 


382 


symptom-ontology 


AC 


839 


935 


syndrom ic-surveil lance-ontology 


ACXT{V) 


1,679 


364 


sysmo-jerm 


SI(V) 


417 


280 


systems- biology 


AC 


587 


558 


systems-chemical-biology-chemogenomics 


SHXJV(V) 


489 


216 


taxonomic-rank- vocabulary 


AC 


58 


59 


tick-gross-anatomy 


ACS+ 


948 


630 


tissue- microarray-ontology 


ACX(V) 


60 


32 


tok ontology 


S1IXQ(V) 


466 


331 


translational-medicine-ontology 


snxN(p) 


499 


389 
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Ontology 


Expressivity 


# Axioms 


Sig. size 


units-of-measurement 


ACS 


343 


336 


units-ontology 


SHIT 


105 


88 


vertebrate-anatomy-ontology 


ACSTZ+ 


340 


234 


vertebrate-homologous-organ-groups 


AC8+ 


1,689 


1,186 


vertebrate-trait-ontology 


AC+ 


3,586 


3,072 


web-service-interaction-ontology 


ACSK+ 


29 


39 


wheat-trait 


AC 


175 


176 


xenopus-anatomy-and-development 


AC8+ 


2,243 


1,051 


yeast-phenotypes 


AC 


266 


300 



